Goto

Collaborating Authors

 accuracy and stability


The Diversified Ensemble Neural Network

Neural Information Processing Systems

Ensemble is a general way of improving the accuracy and stability of learning models, especially for the generalization ability on small datasets. Compared with tree-based methods, relatively less works have been devoted to an in-depth study on effective ensemble design for neural networks. In this paper, we propose a principled ensemble technique by constructing the so-called diversified ensemble layer to combine multiple networks as individual modules. We theoretically show that each individual model in our ensemble layer corresponds to weights in the ensemble layer optimized in different directions. Meanwhile, the devised ensemble layer can be readily integrated into popular neural architectures, including CNNs, RNNs, and GCNs. Extensive experiments are conducted on public tabular datasets, images, and texts. By adopting weight sharing approach, the results show our method can notably improve the accuracy and stability of the original neural networks with ignorable extra time and space overhead.


MOSS: Multi-Objective Optimization for Stable Rule Sets

Liu, Brian, Mazumder, Rahul

arXiv.org Machine Learning

We present MOSS, a multi-objective optimization framework for constructing stable sets of decision rules. MOSS incorporates three important criteria for interpretability: sparsity, accuracy, and stability, into a single multi-objective optimization framework. Importantly, MOSS allows a practitioner to rapidly evaluate the trade-off between accuracy and stability in sparse rule sets in order to select an appropriate model. We develop a specialized cutting plane algorithm in our framework to rapidly compute the Pareto frontier between these two objectives, and our algorithm scales to problem instances beyond the capabilities of commercial optimization solvers. Our experiments show that MOSS outperforms state-of-the-art rule ensembles in terms of both predictive performance and stability.


Secure and Private Federated Learning: Achieving Adversarial Resilience through Robust Aggregation

Yang, Kun, Imam, Neena

arXiv.org Artificial Intelligence

Federated Learning (FL) enables collaborative machine learning across decentralized data sources without sharing raw data. It offers a promising approach to privacy-preserving AI. However, FL remains vulnerable to adversarial threats from malicious participants, referred to as Byzantine clients, who can send misleading updates to corrupt the global model. Traditional aggregation methods, such as simple averaging, are not robust to such attacks. More resilient approaches, like the Krum algorithm, require prior knowledge of the number of malicious clients, which is often unavailable in real-world scenarios. To address these limitations, we propose Average-rKrum (ArKrum), a novel aggregation strategy designed to enhance both the resilience and privacy guarantees of FL systems. Building on our previous work (rKrum), ArKrum introduces two key innovations. First, it includes a median-based filtering mechanism that removes extreme outliers before estimating the number of adversarial clients. Second, it applies a multi-update averaging scheme to improve stability and performance, particularly when client data distributions are not identical. We evaluate ArKrum on benchmark image and text datasets under three widely studied Byzantine attack types. Results show that ArKrum consistently achieves high accuracy and stability. It performs as well as or better than other robust aggregation methods. These findings demonstrate that ArKrum is an effective and practical solution for secure FL systems in adversarial environments.


Evaluating and Advancing Multimodal Large Language Models in Ability Lens

Chen, Feng, Gou, Chenhui, Liu, Jing, Yang, Yang, Li, Zhaoyang, Zhang, Jiyuan, Sun, Zhenbang, Zhuang, Bohan, Wu, Qi

arXiv.org Artificial Intelligence

As multimodal large language models (MLLMs) advance rapidly, rigorous evaluation has become essential, providing further guidance for their development. In this work, we focus on a unified and robust evaluation of \textbf{vision perception} abilities, the foundational skill of MLLMs. We find that existing perception benchmarks, each focusing on different question types, domains, and evaluation metrics, introduce significant evaluation variance, complicating comprehensive assessments of perception abilities when relying on any single benchmark. To address this, we introduce \textbf{AbilityLens}, a unified benchmark designed to evaluate MLLMs across six key perception abilities, focusing on both accuracy and stability, with each ability encompassing diverse question types, domains, and metrics. With the assistance of AbilityLens, we: (1) identify the strengths and weaknesses of current models, highlighting stability patterns and revealing a notable performance gap between open-source and closed-source models; (2) introduce an online evaluation mode, which uncovers interesting ability conflict and early convergence phenomena during MLLM training; and (3) design a simple ability-specific model merging method that combines the best ability checkpoint from early training stages, effectively mitigating performance decline due to ability conflict. The benchmark and online leaderboard will be released soon.


The Diversified Ensemble Neural Network

Neural Information Processing Systems

Ensemble is a general way of improving the accuracy and stability of learning models, especially for the generalization ability on small datasets. Compared with tree-based methods, relatively less works have been devoted to an in-depth study on effective ensemble design for neural networks. In this paper, we propose a principled ensemble technique by constructing the so-called diversified ensemble layer to combine multiple networks as individual modules. We theoretically show that each individual model in our ensemble layer corresponds to weights in the ensemble layer optimized in different directions. Meanwhile, the devised ensemble layer can be readily integrated into popular neural architectures, including CNNs, RNNs, and GCNs.


Weather Prediction Using CNN-LSTM for Time Series Analysis: A Case Study on Delhi Temperature Data

Li, Bangyu, Qian, Yang

arXiv.org Artificial Intelligence

As global climate change intensifies, accurate weather forecasting is increasingly crucial for sectors such as agriculture, energy management, and environmental protection. Traditional methods, which rely on physical and statistical models, often struggle with complex, nonlinear, and time-varying data, underscoring the need for more advanced techniques. This study explores a hybrid CNN-LSTM model to enhance temperature forecasting accuracy for the Delhi region, using historical meteorological data from 1996 to 2017. We employed both direct and indirect methods, including comprehensive data preprocessing and exploratory analysis, to construct and train our model. The CNN component effectively extracts spatial features, while the LSTM captures temporal dependencies, leading to improved prediction accuracy. Experimental results indicate that the CNN-LSTM model significantly outperforms traditional forecasting methods in terms of both accuracy and stability, with a mean square error (MSE) of 3.26217 and a root mean square error (RMSE) of 1.80615. The hybrid model demonstrates its potential as a robust tool for temperature prediction, offering valuable insights for meteorological forecasting and related fields. Future research should focus on optimizing model architecture, exploring additional feature extraction techniques, and addressing challenges such as overfitting and computational complexity. This approach not only advances temperature forecasting but also provides a foundation for applying deep learning to other time series forecasting tasks.


Streamlining Redundant Layers to Compress Large Language Models

Chen, Xiaodong, Hu, Yuxuan, Zhang, Jing, Wang, Yanling, Li, Cuiping, Chen, Hong

arXiv.org Artificial Intelligence

This paper introduces LLM-Streamline, a novel layer pruning approach for large language models. It is based on the observation that different layers have varying impacts on hidden states, enabling the identification of less important layers. LLM-Streamline comprises two parts: layer pruning, which removes consecutive layers with the lowest importance based on target sparsity, and layer replacement, where a lightweight network is trained to replace the pruned layers to mitigate performance loss. Additionally, a new metric called "stability" is proposed to address the limitations of accuracy in evaluating model compression. Experiments show that LLM-Streamline surpasses previous state-of-the-art pruning methods in both accuracy and stability.


ASI: Accuracy-Stability Index for Evaluating Deep Learning Models

Dai, Wei, Berleant, Daniel

arXiv.org Artificial Intelligence

In the context of deep learning research, where model introductions continually occur, the need for effective and efficient evaluation remains paramount. Existing methods often emphasize accuracy metrics, overlooking stability. To address this, the paper introduces the Accuracy-Stability Index (ASI), a quantitative measure incorporating both accuracy and stability for assessing deep learning models. Experimental results demonstrate the application of ASI, and a 3D surface model is presented for visualizing ASI, mean accuracy, and coefficient of variation. This paper addresses the important issue of quantitative benchmarking metrics for deep learning models, providing a new approach for accurately evaluating accuracy and stability of deep learning models. The paper concludes with discussions on potential weaknesses and outlines future research directions.


To be or not to be stable, that is the question: understanding neural networks for inverse problems

Evangelista, Davide, Nagy, James, Morotti, Elena, Piccolomini, Elena Loli

arXiv.org Artificial Intelligence

The solution of linear inverse problems arising, for example, in signal and image processing is a challenging problem since the ill-conditioning amplifies the noise on the data. Recently introduced algorithms based on deep learning overwhelm the more traditional model-based approaches, but they typically suffer from instability with respect to data perturbation. In this paper, we theoretically analyze the trade-off between neural networks stability and accuracy in the solution of linear inverse problems. Moreover, we propose different supervised and unsupervised solutions to increase network stability that maintains good accuracy by inheriting, in the network training, regularization from a model-based iterative scheme. Extensive numerical experiments on image deblurring confirm the theoretical results and the effectiveness of the proposed deep learning-based solutions to stably solve noisy inverse problems.


Adversarial Robustness of Deep Convolutional Candlestick Learner

Chen, Jun-Hao, Chen, Samuel Yen-Chi, Tsai, Yun-Cheng, Shur, Chih-Shiang

arXiv.org Machine Learning

Deep learning (DL) has been applied extensively in a wide range of fields. However, it has been shown that DL models are susceptible to a certain kinds of perturbations called \emph{adversarial attacks}. To fully unlock the power of DL in critical fields such as financial trading, it is necessary to address such issues. In this paper, we present a method of constructing perturbed examples and use these examples to boost the robustness of the model. Our algorithm increases the stability of DL models for candlestick classification with respect to perturbations in the input data.